Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Authors

  • Ali Rezaee Department of Computer Engineering, Islamic Azad University, Science and Research Branch,Tehran, Iran.
  • Avishan Sharafi Department of Computer Engineering, Islamic Azad University South Tehran Branch
Abstract:

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop doesn’t consider load state of each node in distribution input data blocks, which may cause inappropriate overhead and reduce Hadoop performance, but in practice, such data placement policy can noticeably reduce MapReduce performance and may increase extra energy dissipation in heterogeneous environments. This paper proposes a resource aware adaptive dynamic data placement algorithm (ADDP) .With ADDP algorithm, we can resolve the unbalanced node workload problem based on node load status. The proposed method can dynamically adapt and balance data stored on each node based on node load status in a heterogeneous Hadoop cluster. Experimental results show that data transfer overhead decreases in comparison with DDP and traditional Hadoop algorithms. Moreover, the proposed method can decrease the execution time and improve the system’s throughput by increasing resource utilization

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems

The MapReduce and Hadoop frameworks were designed to support efficient large scale computations. There has been growing interest in employing Hadoop clusters for various diverse applications. A large number of (heterogeneous) clients, using the same Hadoop cluster, can result in tensions between the various performance metrics by which such systems are measured. On the one hand, from the servic...

full text

An Improved Data Placement Strategy in a Heterogeneous Hadoop Cluster

Hadoop Distributed File System (HDFS) is designed to store big data reliably, and to stream these data at high bandwidth to user applications. However, the default HDFS block placement policy assumes that all nodes in the cluster are homogeneous, and randomly place blocks without considering any nodes’ resource characteristics, which decreases self-adaptability of the system. In this paper, we ...

full text

Data Placement Strategy for Hadoop Clusters

Wireless technology has become very widely used; and an array of security measures, such as authentication, confidentiality strategies, and 802.11 wireless communication protocol based security schemas have been proposed and applied to real-time wireless networks. However, most of the measures only consider security issues in static mode, in which security levels are all configured when wireles...

full text

Heterogeneous Neural Networks for Adaptive Behavior in Dynamic Environments

Leon S. Sterling CS Dept. & CAISR CWRU Research in artificial neural networks has genera1ly emphasized homogeneous architectures. In contrast, the nervous systems of natural animals exhibit great heterogeneity in both their elements and patterns of interconnection. This heterogeneity is crucial to the flexible generation of behavior which is essential for survival in a complex, dynamic environm...

full text

Intelligent Block Placement Strategy in Heterogeneous Hadoop Clusters

MapReduce is an important distributed processing model for large-scale data-intensive applications. As an open-source implementation of MapReduce, Hadoop provides enterprises with a cost-efficient solution for their analytics needs. However, the default HDFS block placement policy assumes that computing nodes in a cluster are homogeneous, and tries to balance load by placing blocks randomly, wh...

full text

A Hybrid Dynamic Load Balancing Algorithm for Heterogeneous Environments

Dynamic load balancing has become a necessity in emerging distributed environments to address the inherent heterogeneity in computing resources. The majority of traditional dynamic load balancing algorithms were developed assuming homogeneous set of nodes and suffer significant performance drop under variety of workloads. Moreover, the centralized dynamic approach limits the scalability with th...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 2  issue 4

pages  17- 30

publication date 2016-12-01

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023